Search Results: "enrico"

4 January 2023

Enrico Zini: Staticsite redesign

These are some notes about my redesign work in staticsite 2.x. Maping constraints and invariants I started keeping notes of constraints and invariants, and this helped a lot in keeping bounds on the cognitive efforts of design. I particularly liked how mapping the set of constraints added during site generation has helped breaking down processing into a series of well defined steps. Code that handles each step now has a specific task, and can rely on clear assumptions. Declarative page metadata I designed page metadata as declarative fields added to the Page class. I used typed descriptors for the fields, so that metadata fields can now have logic and validation, and are self-documenting! This is the core of the Field implementation. Lean core I tried to implement as much as possible in feature plugins, leaving to the staticsite core only what is essential to create the structure for plugins to build on. The core provides a tree structure, an abstract Page object that can render to a file and resolve references to other pages, a Site that holds settings and controls the various loading steps, and little else. The only type of content supported by the core is static asset files: Markdown, RestructuredText, images, taxonomies, feeds, directory indices, and so on, are all provided via feature plugins. Feature plugins Feature plugins work by providing functions to be called at the various loading steps, and mixins to be added to site pages. Mixins provided by feature plugins can add new declarative metadata fields, and extend Page methods: this ends up being very clean and powerful, and plays decently well with mypy's static type checking, too! See for example the code of the alias feature, that allows a page to declare aliases that redirect to it, useful for example when moving content around. It has a mixin (AliasPageMixin) that adds an aliases field that holds a list of page paths. During the "generate" step, when autogenerated pages can be created, the aliases feature iterates through all pages that defined an aliases metadata, and generates the corresponding redirection pages. Self-documenting code Staticsite can list loaded features, features can list the page subclasses that they use, and pages can list metadata fields. As a result, each feature, each type of page, and each field of each page can generate documentation about itself: the staticsite reference is autogenerated in that way, mostly from Feature, Page, and Field docstrings. Understand the language, stay close to the language Python has matured massively in the last years, and I like to stay on top of the language and standard library release notes for each release. I like how what used to be dirty hacks have now found a clean way into the language:

Enrico Zini: Released staticsite 2.x

In theory I wanted to announce the release of staticsite 2.0, but then I found bugs that prevented me from writing this post, so I'm also releasing 2.1 2.2 2.3 :grin: staticsite is the static site generator that I ended up writing after giving other generators a try. I did a big round of cleanup of the code, which among other things allowed me to implement incremental builds. It turned out that staticsite is fast enough that incremental builds are not really needed, however, a bug in caching rendered markdown made me forget about that. Now I fixed that bug, too, and I can choose between running staticsite fast, and ridiculously fast. My favourite bit of this work is the internal cleanup: I found a way to simplify the core design massively, and now the core and plugin system is simple enough that I can explain it, and I'll probably write a blog post or two about it in the next days. On top of that, staticsite is basically clean with mypy running in strict mode! Getting there was a great ride which prompted a lot of thinking about designing code properly, as mypy is pretty good at flagging clumsy hacks. If you want to give it a try, check out the small tutorial A new blog in under one minute.

3 January 2023

Enrico Zini: Things I learnt in December 2022

Python: typing.overload typing.overload makes it easier to type functions with behaviour that depends on input types. Functions marked with @overload are ignored by Python and only used by the type checker:
@overload
def process(response: None) -> None:
    ...
@overload
def process(response: int) -> tuple[int, str]:
    ...
@overload
def process(response: bytes) -> str:
    ...
def process(response):
    # <actual implementation>
Python's multiprocessing and deadlocks Python's multiprocessing is prone to deadlocks in a number of conditions. In my case, the running program was a standard single-process, non-threaded script, but it used complex native libraries which might have been the triggers for the deadlocks. The suggested workaround is using set_start_method("spawn"), but when we tried it we hit serious performance penalties. Lesson learnt: multiprocessing is good for prototypes, and may end up being too hacky for production. In my case, I was already generating small python scripts corresponding to worker tasks, which were useful for reproducing and debugging Magics issues, so I switched to running those as the actual workers. In the future, this may come in handy for dispatching work to HPC nodes, too. Here's a parallel execution scheduler based on asyncio that I wrote to run them, which may always come in handy on other projects.

18 December 2022

Freexian Collaborators: Monthly report about Debian Long Term Support, November 2022 (by Anton Gladky)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In November, 15 contributors have been paid to work on Debian LTS, their reports are available:
  • Abhijith PA did 0.0h (out of 14.0h assigned), thus carrying over 14.0h to the next month.
  • Anton Gladky did 6.0h (out of 15.0h assigned), thus carrying over 9.0h to the next month.
  • Ben Hutchings did 9.0h (out of 24.0h assigned), thus carrying over 15.0h to the next month.
  • Chris Lamb did 18.0h (out of 18.0h assigned).
  • Dominik George did 10.0h (out of 0h assigned and 24.0h from previous period), thus carrying over 14.0h to the next month.
  • Emilio Pozuelo Monfort did 0.0h (out of 38.0h assigned and 19.5h from previous period), thus carrying over 57.5h to the next month.
  • Enrico Zini did 0.0h (out of 0h assigned and 8.0h from previous period), thus carrying over 8.0h to the next month.
  • Helmut Grohne did 17.5h (out of 20.0h assigned).
  • Markus Koschany did 40.0h (out of 40.0h assigned).
  • Ola Lundqvist did 7.5h (out of 11.0h assigned and 5.0h from previous period), thus carrying over 8.5h to the next month.
  • Roberto C. S nchez did 20.25h (out of 0.75h assigned and 31.25h from previous period), thus carrying over 11.75h to the next month.
  • Stefano Rivera did 2.5h (out of 0h assigned and 17.0h from previous period), thus carrying over 14.5h to the next month.
  • Sylvain Beucler did 35.5h (out of 23.0h assigned and 34.5h from previous period), thus carrying over 22.0h to the next month.
  • Thorsten Alteholz did 14.0h (out of 14.0h assigned).
  • Utkarsh Gupta did 41.0h (out of 32.5h assigned and 25.0h from previous period), thus carrying over 16.5h to the next month.

Evolution of the situation In November, we released 43 DLAs, fixing 183 CVEs. We currently have 63 packages in dla-needed.txt that are waiting for updates, which is 19 fewer than the previous month. We re excited to announce that two Debian Developers Tobias Frost and Guilhem Moulin, have completed the on-boarding process and will begin contributing to LTS as of December 2022. Welcome aboard!

Thanks to our sponsors Sponsors that joined recently are in bold.

30 November 2022

Enrico Zini: Things I learnt in November 2022

Debian: Python:
>>> name="test"
>>> print(f" name= ")
name='test'
>>> print(f" 3*8= ")
3*8=24
Leaflet:

29 November 2022

Sam Hartman: Introducing Carthage

For the past four years, I ve been working on Carthage, a free-software Infrastructure as Code framework. We ve finally reached a point where it makes sense to talk about Carthage and what it can do. This is the first in a series of blog posts to introduce Carthage, discuss what it can do and show how it works. Why Another IAC Framework It seems everywhere you look, there are products designed to support the IAC pattern. On the simple side, you could check a Containerfile into Git. Products like Terraform and Vagrant allow you to template cloud infrastructure and VMs. There are more commercial offerings than I can keep up with. We were disappointed by what was out there when we started Carthage. Other products have improved, but for many of our applications we re happy with what Carthage can build. The biggest challenge we ran into is that products wanted us to specify things at the wrong level. For some of our cyber training work we wanted to say things like We want 3 blue teams, each with a couple defended networks, a red team, and some neutral infrastructure for red to exploit. Yet the tools we were trying to use wanted to lay things out at the individual machine/container level. We found ourselves contemplating writing a program to generate input for some other IAC tool. Things were worse for our internal testing. Sometimes we d be shipping hardware to a customer. But sometimes we d be virtualizing that build out in a lab. Sometimes we d be doing a mixture. So we wanted to completely separate the descriptions of machines, networks, and software from any of the information about whether that was realized on hardware, VMs, containers, or a mixture. Dimensional Breakdown In discussing Carthage with Enrico Zini, he pointed me at Cognitive Dimensions of notation as a way to think about how Carthage approaches the IAC problem. I m more interested in the idea of breaking down a design along the idea of dimensions that allow examining the design space than I am particular adherence to Green s original dimensions. Low Viscosity, High Abstraction Reuse One of the guiding principles is that we want to be able to reuse different components at different scales and in different environments. These include being able to do things like: Hidden Dependencies To accomplish these abstraction goals, dependencies need to be non-local. For example, a software role might need to integrate with a directory if a directory is present in the environment. When writing the role, no one is going to know which directory to use, nor whether a directory is present. Taking that as an explicit input into the role is error-prone when the role is combined into large abstract units (bigger roles or collections of machines). Instead it is better to have a non-local dependency, and to find the directory if it is available. We accomplish this using dependency injection. In addition to being non-local, dependencies are sometimes hidden. It is very easy to overwhelm our cognitive capacity with even a fairly simple IAC description. An effective notation allows us to focus on the parts that matter when working with a particular part of the description. I ve found hiding dependencies, especially indirect dependencies, to be essential in building complex descriptions. Obviously, tools are required for examining these dependencies as part of debugging. First Class Modeling Clearly one of the goals of IAC descriptions is to actually build and manage infrastructure. It turns out that there are all sorts of things you want to do with the description well before you instantiate the infrastructure. You might want to query the description to build network diagrams, understand interdependencies, or even build inventory/bill of materials. We often find ourselves building Ansible inventory, switch configurations, DNS zones, and all sorts of configuration artifacts. These artifacts may be installed into infrastructure that is instantiated by the description, but they may be consumed in other ways. Allowing the artifacts to be consumed externally means that you can avoid pre-commitment and focus on whatever part of the description you originally want to work on. You may use an existing network at first. Later the IAC description may replace that, or perhaps it never will. As a result, Carthage separates modeling from instantiation. The model can generally be built and queried without needing to interact with clouds, VMs, or containers. We ve actually found it useful to build Carthage layouts that cannot ever be fully instantiated, for example because they never specify details like whether a model should be instantiated on a container or VM, or what kind of technology will realize a modeled network. This allows developing roles before the machines that use them or focusing on how machines will interact and how the network will be laid out before the details of installing on specific hardware. The modeling separation is by far the difference I value most between Carthage and other systems. A Tool for Experts. In Neal Stephenson s essay In the Beginning Was the Command Line , Stephenson points out that the kind of tools experts need are not the same tools that beginners need. The illustration of why a beginner might not be satisfied with a Hole Hog drill caught my attention. Carthage is a tool for experts. Despite what cloud providers will tell you, IAC is not easy. Doubly so when you start making reusable components. Trying to hide that or focus on making things easy to get started can make it harder for experts to efficiently solve the problems they are facing. When we have faced trade offs between making Carthage easy to pick up and making it powerful for expert users, we have chosen to support the experts. That said, Carthage today is harder to pick up than it needs to be. It s a relatively new project with few external users as of this time. Our documentation and examples need improvement, just like every project at this level of maturity. Similarly, as the set of things people try to do expand, we will doubtless run into bugs that our current test cases don t cover. So Carthage absolutely will get easier to learn and use than it is today. Also, we ve already had success building beginner-focused applications on top of Carthage. For our cyber training, we built web applications on top of Carthage that made rebuilding and exploring infrastructure easy. We ve had success using relatively understood tools like Ansible as integration and customization points for Carthage layouts. But in all these cases, when the core layout had significant reusable components and significant complexity in the networking, only an IAC expert was going to be able to maintain and develop that layout. What Carthage can do. Carthage has a number of capabilities today. One of Carthage s strengths is its extensible design. Abstract interfaces make it easy to add new virtualization platforms, cloud services, and support for various ways of managing real hardware. This approach has been validated by incrementally adding support for virtualization architectures and cloud services. As development has progressed, adding new integrations continues to get faster because we are able to reuse existing infrastructure. Today, Carthage can model: Carthage has excellent facilities for dealing with images on which VMs and Containers can be based, although it does have a bit of a Debian/Ubuntu bias in how it thinks about images: When instantiating infrastructure, Carthage can work with: We have also looked at Oracle Cloud and I believe Openstack, although that code is not merged. Future posts will talk about core Carthage concepts and how to use Carthage to build infrastructure.

comment count unavailable comments

19 November 2022

Freexian Collaborators: Monthly report about Debian Long Term Support, October 2022 (by Rapha l Hertzog)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In October, 15 contributors have been paid to work on Debian LTS, their reports are available:
  • Abhijith PA did 14.0h (out of 2.0h assigned and 12.0h from previous period).
  • Anton Gladky did 20.0h (out of 19.0h assigned and 1.0h from previous period).
  • Ben Hutchings did 9.0h (out of 0h assigned and 9.0h from previous period).
  • Chris Lamb did 18.0h (out of 18.0h assigned).
  • Dominik George did 0.0h (out of 0h assigned and 24.0h from previous period), thus carrying over 24.0h to the next month.
  • Emilio Pozuelo Monfort did 40.5h (out of 58.0h assigned and 2.0h from previous period), thus carrying over 19.5h to the next month.
  • Enrico Zini did 0.0h (out of 0h assigned and 8.0h from previous period), thus carrying over 8.0h to the next month.
  • Helmut Grohne did 15.0h (out of 15.0h assigned).
  • Markus Koschany did 40.0h (out of 40.0h assigned).
  • Ola Lundqvist did 7.0h (out of 12.0h assigned), thus carrying over 5.0h to the next month.
  • Roberto C. S nchez did 0.75h (out of 1.0h assigned and 31.0h from previous period), thus carrying over 31.25h to the next month.
  • Stefano Rivera did 12.5h (out of 9.0h assigned and 26.0h from previous period), thus carrying over 22.5h to the next month.
  • Sylvain Beucler did 25.5h (out of 31.5h assigned and 28.5h from previous period), thus carrying over 34.5h to the next month.
  • Thorsten Alteholz did 14.0h (out of 14.0h assigned).
  • Utkarsh Gupta did 35.0h (out of 38.0h assigned and 22.0h from previous period), thus carrying over 25.0h to the next month.

Evolution of the situation In October, we have released 42 DLAs, closing 106 CVEs. At the moment we have 82 packages in dla-needed.txt, waiting for update. We are continuously working on updating our infrastructure, trying to document all of our changes in the git-repo. Most of packages there are having continuous integration (CI) pipelines.

Thanks to our sponsors Sponsors that joined recently are in bold.

11 November 2022

Enrico Zini: Sharing argparse arguments with subcommands

argparse subcommands are great, but they have a quirk in which options are only available right after the subcommand that define them. So, if you for example add the --verbose / -v argument to your main parser, and you have subcommands, you need to give the -v option before the subcommand name. For example, given this script:
#!/usr/bin/python3
import argparse
parser = argparse.ArgumentParser(description="test")
parser.add_argument("-v", "--verbose", action="store_true")
subparsers = parser.add_subparsers(dest="handler", required=True)
subparsers.add_parser("test")
args = parser.parse_args()
print(args.verbose)
You get this behaviour:
$ ./mycmd test
False
$ ./mycmd -v test
True
$ ./mycmd test -v
usage: mycmd [-h] [-v]  test  ...
mycmd: error: unrecognized arguments: -v
This sometimes makes sense, and many other times it's really annoying, since the user has to remember at which level an option was defined. Last night some pieces clicked in my head, and I created a not-too-dirty ArgumentParser subclass that adds a shared option to arguments, that propagates them to subparsers:
#!/usr/bin/python3
from hacks import argparse
parser = argparse.ArgumentParser(description="test")
parser.add_argument("-v", "--verbose", action="store_true", shared=True)
subparsers = parser.add_subparsers(dest="handler", required=True)
subparsers.add_parser("test")
args = parser.parse_args()
print(args.verbose)
And finally, -v can be given at all levels:
$ ./mycmd test
False
$ ./mycmd -v test
True
$ ./mycmd test -v
True
It even works recursively, forwarding arguments to sub-subparsers, but at the moment it can only do it for store_true and store kind of arguments.

31 August 2022

Rapha&#235;l Hertzog: Freexian s report about Debian Long Term Support, July 2022

A Debian LTS logo
Like each month, have a look at the work funded by Freexian s Debian LTS offering. Debian project funding No any major updates on running projects.
Two 1, 2 projects are in the pipeline now.
Tryton project is in a review phase. Gradle projects is still fighting in work. In July, we put aside 2389 EUR to fund Debian projects. We re looking forward to receive more projects from various Debian teams! Learn more about the rationale behind this initiative in this article. Debian LTS contributors In July, 14 contributors have been paid to work on Debian LTS, their reports are available: Evolution of the situation In July, we have released 3 DLAs. July was the period, when the Debian Stretch had already ELTS status, but Debian Buster was still in the hands of security team. Many member of LTS used this time to update internal infrastructure, documentation and some internal tickets. Now we are ready to take the next release in our hands: Buster! Thanks to our sponsors Sponsors that joined recently are in bold.

26 July 2022

Rapha&#235;l Hertzog: Freexian s report about Debian Long Term Support, June 2022

A Debian LTS logo
Like each month, have a look at the work funded by Freexian s Debian LTS offering. Debian project funding No any major updates on running projects.
Two 1, 2 projects are in the pipeline now.
Tryton project is in a review phase. Gradle projects is still fighting in work. In June, we put aside 2254 EUR to fund Debian projects. We re looking forward to receive more projects from various Debian teams! Learn more about the rationale behind this initiative in this article. Debian LTS contributors In June, 15 contributors have been paid to work on Debian LTS, their reports are available: Evolution of the situation In June we released 27 DLAs.

This is a special month, where we have two releases (stretch and jessie) as ELTS and NO release as LTS. Buster is still handled by the security team and will probably be given in LTS hands at the beginning of the August. During this month we are updating the infrastructure, documentation and improve our internal processes to switch to a new release.
Many developers have just returned back from Debconf22, hold in Prizren, Kosovo! Many (E)LTS members could meet face-to-face and discuss some technical and social topics! Also LTS BoF took place, where the project was introduced (link to video).
Thanks to our sponsors Sponsors that joined recently are in bold. We are pleased to welcome Alter Way where their support of Debian is publicly acknowledged at the higher level, see this French quote of Alterway s CEO.

20 July 2022

Enrico Zini: Deconstruction of the DAM hat

Further reading Talk notes Intro Debian Account Managers Responsibility for official membership What DAM is not Unexpected responsibilities DAM warnings DAM warnings? House rules Interpreting house rules Governance by bullying How about the Community Team? How about DAM? How about the DPL? Concentrating responsibility Empowering developers What needs to happen

23 June 2022

Rapha&#235;l Hertzog: Freexian s report about Debian Long Term Support, May 2022

A Debian LTS logo
Like each month, have a look at the work funded by Freexian s Debian LTS offering. Debian project funding Two [1, 2] projects are in the pipeline now. Tryton project is in a final phase. Gradle projects is fighting with technical difficulties. In May, we put aside 2233 EUR to fund Debian projects. We re looking forward to receive more projects from various Debian teams! Learn more about the rationale behind this initiative in this article. Debian LTS contributors In May, 14 contributors have been paid to work on Debian LTS, their reports are available: Evolution of the situation In May we released 49 DLAs. The security tracker currently lists 71 packages with a known CVE and the dla-needed.txt file has 65 packages needing an update. The number of paid contributors increased significantly, we are pleased to welcome our latest team members: Andreas R nnquist, Dominik George, Enrico Zini and Stefano Rivera. It is worth pointing out that we are getting close to the end of the LTS period for Debian 9. After June 30th, no new security updates will be made available on security.debian.org. We are preparing to overtake Debian 10 Buster for the next two years and to make this process as smooth as possible. But Freexian and its team of paid Debian contributors will continue to maintain Debian 9 going forward for the customers of the Extended LTS offer. If you have Debian 9 servers to keep secure, it s time to subscribe! You might not have noticed, but Freexian formalized a mission statement where we explain that our purpose is to help improve Debian. For this, we want to fund work time for the Debian developers that recently joined Freexian as collaborators. The Extended LTS and the PHP LTS offers are built following a model that will help us to achieve this if we manage to have enough customers for those offers. So consider subscribing: you help your organization but you also help Debian! Thanks to our sponsors Sponsors that joined recently are in bold.

9 June 2022

Enrico Zini: Updating cbqt for bullseye

Back in 2017 I did work to setup a cross-building toolchain for QT Creator, that takes advantage of Debian's packaging for all the dependency ecosystem. It ended with cbqt which is a little script that sets up a chroot to hold cross-build-dependencies, to avoid conflicting with packages in the host system, and sets up a qmake alternative to make use of them. Today I'm dusting off that work, to ensure it works on Debian bullseye. Resetting QT Creator To make things reproducible, I wanted to reset QT Creator's configuration. Besides purging and reinstalling the package, one needs to manually remove: Updating cbqt Easy start, change the distribution for the chroot:
-DIST_CODENAME = "stretch"
+DIST_CODENAME = "bullseye"
Adding LIBDIR Something else does not work:
Test$ qmake-armhf -makefile
Info: creating stash file  /Test/.qmake.stash
Test$ make
[...]
/usr/bin/arm-linux-gnueabihf-g++ -Wl,-O1 -Wl,-rpath-link, /armhf/lib/arm-linux-gnueabihf -Wl,-rpath-link, /armhf/usr/lib/arm-linux-gnueabihf -Wl,-rpath-link, /armhf/usr/lib/ -o Test main.o mainwindow.o moc_mainwindow.o    /armhf/usr/lib/arm-linux-gnueabihf/libQt5Widgets.so  /armhf/usr/lib/arm-linux-gnueabihf/libQt5Gui.so  /armhf/usr/lib/arm-linux-gnueabihf/libQt5Core.so -lGLESv2 -lpthread
/usr/lib/gcc-cross/arm-linux-gnueabihf/10/../../../../arm-linux-gnueabihf/bin/ld: cannot find -lGLESv2
collect2: error: ld returned 1 exit status
make: *** [Makefile:146: Test] Error 1
I figured that now I also need to set QMAKE_LIBDIR and not just QMAKE_RPATHLINKDIR:
--- a/cbqt
+++ b/cbqt
@@ -241,18 +241,21 @@ include(../common/linux.conf)
 include(../common/gcc-base-unix.conf)
 include(../common/g++-unix.conf)
+QMAKE_LIBDIR +=  chroot.abspath /lib/arm-linux-gnueabihf
+QMAKE_LIBDIR +=  chroot.abspath /usr/lib/arm-linux-gnueabihf
+QMAKE_LIBDIR +=  chroot.abspath /usr/lib/
 QMAKE_RPATHLINKDIR +=  chroot.abspath /lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR +=  chroot.abspath /usr/lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR +=  chroot.abspath /usr/lib/
Now it links again:
Test$ qmake-armhf -makefile
Test$ make
/usr/bin/arm-linux-gnueabihf-g++ -Wl,-O1 -Wl,-rpath-link, /armhf/lib/arm-linux-gnueabihf -Wl,-rpath-link, /armhf/usr/lib/arm-linux-gnueabihf -Wl,-rpath-link, /armhf/usr/lib/ -o Test main.o mainwindow.o moc_mainwindow.o   -L /armhf/lib/arm-linux-gnueabihf -L /armhf/usr/lib/arm-linux-gnueabihf -L /armhf/usr/lib/  /armhf/usr/lib/arm-linux-gnueabihf/libQt5Widgets.so  /armhf/usr/lib/arm-linux-gnueabihf/libQt5Gui.so  /armhf/usr/lib/arm-linux-gnueabihf/libQt5Core.so -lGLESv2 -lpthread
Making it work in Qt Creator Time to try it in Qt Creator, and sadly it fails:
 /armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf:76: Variable QMAKE_CXX.COMPILER_MACROS is not defined.
QMAKE_CXX.COMPILER_MACROS is not defined I traced it to this bit in armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf (nonrelevant bits deleted):
isEmpty($$ target_prefix .COMPILER_MACROS)  
    msvc  
        #  
      else: gcc ghs  
        vars = $$qtVariablesFromGCC($$QMAKE_CXX)
     
    for (v, vars)  
        #  
        $$ target_prefix .COMPILER_MACROS += $$v
     
    cache($$ target_prefix .COMPILER_MACROS, set stash)
  else  
    #  
 
It turns out that qmake is not able to realise that the compiler is gcc, so vars does not get set, nothing is set in COMPILER_MACROS, and qmake fails. Reproducing it on the command line When run manually, however, qmake-armhf worked, so it would be good to know how Qt Creator is actually running qmake. Since it frustratingly does not show what commands it runs, I'll have to strace it:
strace -e trace=execve --string-limit=123456 -o qtcreator.trace -f qtcreator
And there it is:
$ grep qmake- qtcreator.trace
1015841 execve("/usr/local/bin/qmake-armhf", ["/usr/local/bin/qmake-armhf", "-query"], 0x56096e923040 /* 54 vars */) = 0
1015865 execve("/usr/local/bin/qmake-armhf", ["/usr/local/bin/qmake-armhf", " /Test/Test.pro", "-spec", "arm-linux-gnueabihf", "CONFIG+=debug", "CONFIG+=qml_debug"], 0x7f5cb4023e20 /* 55 vars */) = 0
I run the command manually and indeed I reproduce the problem:
$ /usr/local/bin/qmake-armhf Test.pro -spec arm-linux-gnueabihf CONFIG+=debug CONFIG+=qml_debug
 /armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf:76: Variable QMAKE_CXX.COMPILER_MACROS is not defined.
I try removing options until I find the one that breaks it and... now it's always broken! Even manually running qmake-armhf, like I did earlier, stopped working:
$ rm .qmake.stash
$ qmake-armhf -makefile
 /armhf/usr/lib/arm-linux-gnueabihf/qt5/mkspecs/features/toolchain.prf:76: Variable QMAKE_CXX.COMPILER_MACROS is not defined.
Debugging toolchain.prf I tried purging and reinstalling qtcreator, and recreating the chroot, but qmake-armhf is staying broken. I'll let that be, and try to debug toolchain.prf. By grepping gcc in the mkspecs directory, I managed to figure out that: Sadly, I failed to find reference documentation for QMAKE_COMPILER's syntax and behaviour. I also failed to find why qmake-armhf worked earlier, and I am also failing to restore the system to a situation where it works again. Maybe I dreamt that it worked? I had some manual change laying around from some previous fiddling with things? Anyway at least now I have the fix:
--- a/cbqt
+++ b/cbqt
@@ -248,7 +248,7 @@ QMAKE_RPATHLINKDIR +=  chroot.abspath /lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR +=  chroot.abspath /usr/lib/arm-linux-gnueabihf
 QMAKE_RPATHLINKDIR +=  chroot.abspath /usr/lib/
-QMAKE_COMPILER          =  chroot.arch_triplet -gcc
+QMAKE_COMPILER          = gcc  chroot.arch_triplet -gcc
 QMAKE_CC                = /usr/bin/ chroot.arch_triplet -gcc
Fixing a compiler mismatch warning In setting up the kit, Qt Creator also complained that the compiler from qmake did not match the one configured in the kit. That was easy to fix, by pointing at the host system cross-compiler in qmake.conf:
 QMAKE_COMPILER          =  chroot.arch_triplet -gcc
-QMAKE_CC                =  chroot.arch_triplet -gcc
+QMAKE_CC                = /usr/bin/ chroot.arch_triplet -gcc
 QMAKE_LINK_C            = $$QMAKE_CC
 QMAKE_LINK_C_SHLIB      = $$QMAKE_CC
-QMAKE_CXX               =  chroot.arch_triplet -g++
+QMAKE_CXX               = /usr/bin/ chroot.arch_triplet -g++
 QMAKE_LINK              = $$QMAKE_CXX
 QMAKE_LINK_SHLIB        = $$QMAKE_CXX
Updated setup instructions Create an armhf environment:
sudo cbqt ./armhf --create --verbose
Create a qmake wrapper that builds with this environment:
sudo ./cbqt ./armhf --qmake -o /usr/local/bin/qmake-armhf
Install the build-dependencies that you need:
# Note: :arch is added automatically to package names if no arch is explicitly specified
sudo ./cbqt ./armhf --install libqt5svg5-dev libmosquittopp-dev qtwebengine5-dev
Build with qmake Use qmake-armhf instead of qmake and it works perfectly:
qmake-armhf -makefile
make
Set up Qt Creator Configure a new Kit in Qt Creator:
  1. Tools/Options, then Kits, then Add
  2. Name: armhf (or anything you like)
  3. In the Qt Versions tab, click Add then set the path of the new Qt to /usr/local/bin/qmake-armhf. Click Apply.
  4. Back in the Kits, select the Qt version you just created in the Qt version field
  5. In Compilers, select the ARM versions of GCC. If they do not appear, install crossbuild-essential-armhf, then in the Compilers tab click Re-detect and then Apply to make them available for selection
  6. Dismiss the dialog with "OK": the new kit is ready
Now you can choose the default kit to build and run locally, and the armhf kit for remote cross-development. I tried looking at sdktool to automate this step, and it requires a nontrivial amount of work to do it reliably, so these manual instructions will have to do. Credits This has been done as part of my work with Truelite.

18 March 2022

Enrico Zini: Context-dependent logger in Python

This is a common logging pattern in Python, to have loggers related to module names:
import logging
log = logging.getLogger(__name__)
class Bill:
    def load_bill(self, filename: str):
        log.info("%s: loading file", filename)
I often however find myself wanting to have loggers related to something context-dependent, like the kind of file that is being processed. For example, I'd like to log loading of bill loading when done by the expenses module, and not when done by the printing module. I came up with a little hack that keeps the same API as before, and allows to propagate a context dependent logger to the code called:
# Call this file log.py
from __future__ import annotations
import contextlib
import contextvars
import logging
_log: contextvars.ContextVar[logging.Logger] = contextvars.ContextVar('log', default=logging.getLogger())
@contextlib.contextmanager
def logger(name: str):
    """
    Set a default logger for the duration of this context manager
    """
    old = _log.set(logging.getLogger(name))
    try:
        yield
    finally:
        _log.reset(old)
def debug(*args, **kw):
    _log.get().debug(*args, **kw)
def info(*args, **kw):
    _log.get().info(*args, **kw)
def warning(*args, **kw):
    _log.get().warning(*args, **kw)
def error(*args, **kw):
    _log.get().error(*args, **kw)
And now I can do this:
from . import log
#  
    with log.logger("expenses"):
        bill = load_bill(filename)
# This code did not change!
class Bill:
    def load_bill(self, filename: str):
        log.info("%s: loading file", filename)

3 March 2022

Enrico Zini: Migrating from procmail to sieve

Anarcat's "procmail considered harmful" post convinced me to get my act together and finally migrate my venerable procmail based setup to sieve. My setup was nontrivial, so I migrated with an intermediate step in which sieve scripts would by default pipe everything to procmail, which allowed me to slowly move rules from procmailrc to sieve until nothing remained in procmailrc. Here's what I did. Literature review https://brokkr.net/2019/10/31/lets-do-dovecot-slowly-and-properly-part-3-lmtp/ has a guide quite aligned with current Debian, and could be a starting point to get an idea of the work to do. https://wiki.dovecot.org/HowTo/PostfixDovecotLMTP is way more terse, but more aligned with my intentions. Reading the former helped me in understanding the latter. https://datatracker.ietf.org/doc/html/rfc5228 has the full Sieve syntax. https://doc.dovecot.org/configuration_manual/sieve/pigeonhole_sieve_interpreter/ has the list of Sieve features supported by Dovecot. https://doc.dovecot.org/settings/pigeonhole/ has the reference on Dovecot's sieve implementation. https://raw.githubusercontent.com/dovecot/pigeonhole/master/doc/rfc/spec-bosch-sieve-extprograms.txt is the hard to find full reference for the functions introduced by the extprograms plugin. Debugging tools: Backup of all mails processed One thing I did with procmail was to generate a monthly mailbox with all incoming email, with something like this:
BACKUP="/srv/backupts/test- date +%Y-%m-d .mbox"
:0c
$BACKUP
I did not find an obvious way in sieve to create montly mailboxes, so I redesigned that system using Postfix's always_bcc feature, piping everything to an archive user. I'll then recreate the monthly archiving using a chewmail script that I can simply run via cron. Configure dovecot
apt install dovecot-sieve dovecot-lmtpd
I added this to the local dovecot configuration:
service lmtp  
  unix_listener /var/spool/postfix/private/dovecot-lmtp  
    user = postfix
    group = postfix
    mode = 0666
   
 
protocol lmtp  
  mail_plugins = $mail_plugins sieve
 
plugin  
  sieve = file:~/.sieve;active=~/.dovecot.sieve
 
This makes Dovecot ready to receive mail from Postfix via a lmtp unix socket created in Postfix's private chroot. It also activates the sieve plugin, and uses ~/.sieve as a sieve script. The script can be a file or a directory; if it is a directory, ~/.dovecot.sieve will be a symlink pointing to the .sieve file to run. This is a feature I'm not yet using, but if one day I want to try enabling UIs to edit sieve scripts, that part is ready. Delegate to procmail To make sieve scripts that delegate to procmail, I enabled the sieve_extprograms plugin:
 plugin  
   sieve = file:~/.sieve;active=~/.dovecot.sieve
+  sieve_plugins = sieve_extprograms
+  sieve_extensions +vnd.dovecot.pipe
+  sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+  sieve_trace_dir = ~/.sieve-trace
+  sieve_trace_level = matching
+  sieve_trace_debug = yes
  
and then created a script for it:
mkdir -p /usr/local/lib/dovecot/sieve-pipe/
(echo "#!/bin/sh'; echo "exec /usr/bin/procmail") > /usr/local/lib/dovecot/sieve-pipe/procmail
chmod 0755 /usr/local/lib/dovecot/sieve-pipe/procmail
And I can have a sieve script that delegates processing to procmail:
require "vnd.dovecot.pipe";
pipe "procmail";
Activate the postfix side These changes switched local delivery over to Dovecot:
--- a/roles/mailserver/templates/dovecot.conf
+++ b/roles/mailserver/templates/dovecot.conf
@@ -25,6 +25,8 @@
 
+auth_username_format = %Ln
+
 
diff --git a/roles/mailserver/templates/main.cf b/roles/mailserver/templates/main.cf
index d2c515a..d35537c 100644
--- a/roles/mailserver/templates/main.cf
+++ b/roles/mailserver/templates/main.cf
@@ -64,8 +64,7 @@ virtual_alias_domains =
 
-mailbox_command = procmail -a "$EXTENSION"
-mailbox_size_limit = 0
+mailbox_transport = lmtp:unix:private/dovecot-lmtp
 
Without auth_username_format = %Ln dovecot won't be able to understand usernames sent by postfix in my specific setup. Moving rules over to sieve This is mostly straightforward, with the luxury of being able to do it a bit at a time. The last tricky bit was how to call spamc from sieve, as in some situations I reduce system load by running the spamfilter only on a prefiltered selection of incoming emails. For this I enabled the filter directive in sieve:
 plugin  
   sieve = file:~/.sieve;active=~/.dovecot.sieve
   sieve_plugins = sieve_extprograms
-  sieve_extensions +vnd.dovecot.pipe
+  sieve_extensions +vnd.dovecot.pipe +vnd.dovecot.filter
   sieve_pipe_bin_dir = /usr/local/lib/dovecot/sieve-pipe
+  sieve_filter_bin_dir = /usr/local/lib/dovecot/sieve-filter
   sieve_trace_dir = ~/.sieve-trace
   sieve_trace_level = matching
   sieve_trace_debug = yes
  
Then I created a filter script:
mkdir -p /usr/local/lib/dovecot/sieve-filter/"
(echo "#!/bin/sh'; echo "exec /usr/bin/spamc") > /usr/local/lib/dovecot/sieve-filter/spamc
chmod 0755 /usr/local/lib/dovecot/sieve-filter/spamc
And now what was previously:
:0 fw
  /usr/bin/spamc
:0
* ^X-Spam-Status: Yes
.spam/
Can become:
require "vnd.dovecot.filter";
require "fileinto";
filter "spamc";
if header :contains "x-spam-level" "**************"  
    discard;
  elsif header :matches "X-Spam-Status" "Yes,*"  
    fileinto "spam";
 
Updates Ansgar mentioned that it's possible to replicate the monthly mailbox using the variables and date extensions, with a hacky trick from the extensions' RFC:
require "date"
require "variables"
if currentdate :matches "month" "*"   set "month" "$ 1 ";  
if currentdate :matches "year" "*"   set "year" "$ 1 ";  
fileinto :create "$ month -$ year ";

2 March 2022

Antoine Beaupr : procmail considered harmful

TL;DR: procmail is a security liability and has been abandoned upstream for the last two decades. If you are still using it, you should probably drop everything and at least remove its SUID flag. There are plenty of alternatives to chose from, and conversion is a one-time, acceptable trade-off.

Procmail is unmaintained procmail is unmaintained. The "Final release", according to Wikipedia, dates back to September 10, 2001 (3.22). That release was shipped in Debian since then, all the way back from Debian 3.0 "woody", twenty years ago. Debian also ships 25 uploads on top of this, with 3.22-21 shipping the "3.23pre" release that has been rumored since at least the November 2001, according to debian/changelog at least:
procmail (3.22-1) unstable; urgency=low
  * New upstream release, which uses the  standard' format for Maildir
    filenames and retries on name collision. It also contains some
    bug fixes from the 3.23pre snapshot dated 2001-09-13.
  * Removed  sendmail' from the Recommends field, since we already
    have  exim' (the default Debian MTA) and  mail-transport-agent'.
  * Removed suidmanager support. Conflicts: suidmanager (<< 0.50).
  * Added support for DEB_BUILD_OPTIONS in the source package.
  * README.Maildir: Do not use locking on the example recipe,
    since it's wrong to do so in this case.
 -- Santiago Vila <sanvila@debian.org>  Wed, 21 Nov 2001 09:40:20 +0100
All Debian suites from buster onwards ship the 3.22-26 release, although the maintainer just pushed a 3.22-27 release to fix a seven year old null pointer dereference, after this article was drafted. Procmail is also shipped in all major distributions: Fedora and its derivatives, Debian derivatives, Gentoo, Arch, FreeBSD, OpenBSD. We all seem to be ignoring this problem. The upstream website (http://procmail.org/) has been down since about 2015, according to Debian bug #805864, with no change since. In effect, every distribution is currently maintaining its fork of this dead program. Note that, after filing a bug to keep Debian from shipping procmail in a stable release again, I was told that the Debian maintainer is apparently in contact with the upstream. And, surprise! they still plan to release that fabled 3.23 release, which has been now in "pre-release" for all those twenty years. In fact, it turns out that 3.23 is considered released already, and that the procmail author actually pushed a 3.24 release, codenamed "Two decades of fixes". That amounts to 25 commits since 3.23pre some of which address serious security issues, but none of which address fundamental issues with the code base.

Procmail is insecure By default, procmail is installed SUID root:mail in Debian. There's no debconf or pre-seed setting that can change this. There has been two bug reports against the Debian to make this configurable (298058, 264011), but both were closed to say that, basically, you should use dpkg-statoverride to change the permissions on the binary. So if anything, you should immediately run this command on any host that you have procmail installed on:
dpkg-statoverride --update --add root root 0755 /usr/bin/procmail
Note that this might break email delivery. It might also not work at all, thanks to usrmerge. Not sure. Yes, everything is on fire. This is fine. In my opinion, even assuming we keep procmail in Debian, that default should be reversed. It should be up to people installing procmail to assign it those dangerous permissions, after careful consideration of the risk involved. The last maintainer of procmail explicitly advised us (in that null pointer dereference bug) and other projects (e.g. OpenBSD, in [2]) to stop shipping it, back in 2014. Quote:
Executive summary: delete the procmail port; the code is not safe and should not be used as a basis for any further work.
I just read some of the code again this morning, after the original author claimed that procmail was active again. It's still littered with bizarre macros like:
#define bit_set(name,which,value) \
  (value?(name[bit_index(which)] =bit_mask(which)):\
  (name[bit_index(which)]&=~bit_mask(which)))
... from regexp.c, line 66 (yes, that's a custom regex engine). Or this one:
#define jj  (aleps.au.sopc)
It uses insecure functions like strcpy extensively. malloc() is thrown around gotos like it's 1984 all over again. (To be fair, it has been feeling like 1984 a lot lately, but that's another matter entirely.) That null pointer deref bug? It's fixed upstream now, in this commit merged a few hours ago, which I presume might be in response to my request to remove procmail from Debian. So while that's nice, this is the just tip of the iceberg. I speculate that one could easily find an exploitable crash in procmail if only by running it through a fuzzer. But I don't need to speculate: procmail had, for years, serious security issues that could possibly lead to root privilege escalation, remotely exploitable if procmail is (as it's designed to do) exposed to the network. Maybe I'm overreacting. Maybe the procmail author will go through the code base and do a proper rewrite. But I don't think that's what is in the cards right now. What I expect will happen next is that people will start fuzzing procmail, throw an uncountable number of bug reports at it which will get fixed in a trickle while never fixing the underlying, serious design flaws behind procmail.

Procmail has better alternatives The reason this is so frustrating is that there are plenty of modern alternatives to procmail which do not suffer from those problems. Alternatives to procmail(1) itself are typically part of mail servers. For example, Dovecot has its own LDA which implements the standard Sieve language (RFC 5228). (Interestingly, Sieve was published as RFC 3028 in 2001, before procmail was formally abandoned.) Courier also has "maildrop" which has its own filtering mechanism, and there is fdm (2007) which is a fetchmail and procmail replacement. Update: there's also mailprocessing, which is not an LDA, but processing an existing folder. It was, however, specifically designed to replace complex Procmail rules. But procmail, of course, doesn't just ship procmail; that would just be too easy. It ships mailstat(1) which we could probably ignore because it only parses procmail log files. But more importantly, it also ships:
  • lockfile(1) - conditional semaphore-file creator
  • formail(1) - mail (re)formatter
lockfile(1) already has a somewhat acceptable replacement in the form of flock(1), part of util-linux (which is Essential, so installed on any normal Debian system). It might not be a direct drop-in replacement, but it should be close enough. formail(1) is similar: the courier maildrop package ships reformail(1) which is, presumably, a rewrite of formail. It's unclear if it's a drop-in replacement, but it should probably possible to port uses of formail to it easily.
Update: the maildrop package ships a SUID root binary (two, even). So if you want only reformail(1), you might want to disable that with:
dpkg-statoverride --update --add root root 0755 /usr/bin/lockmail.maildrop 
dpkg-statoverride --update --add root root 0755 /usr/bin/maildrop
It would be perhaps better to have reformail(1) as a separate package, see bug 1006903 for that discussion.
The real challenge is, of course, migrating those old .procmailrc recipes to Sieve (basically). I added a few examples in the appendix below. You might notice the Sieve examples are easier to read, which is a nice added bonus.

Conclusion There is really, absolutely, no reason to keep procmail in Debian, nor should it be used anywhere at this point. It's a great part of our computing history. May it be kept forever in our museums and historical archives, but not in Debian, and certainly not in actual release. It's just a bomb waiting to go off. It is irresponsible for distributions to keep shipping obsolete and insecure software like this for unsuspecting users. Note that I am grateful to the author, I really am: I used procmail for decades and it served me well. But now, it's time to move, not bring it back from the dead.

Appendix

Previous work It's really weird to have to write this blog post. Back in 2016, I rebuilt my mail setup at home and, to my horror, discovered that procmail had been abandoned for 15 years at that point, thanks to that LWN article from 2010. I would have thought that I was the only weirdo still running procmail after all those years and felt kind of embarrassed to only "now" switch to the more modern (and, honestly, awesome) Sieve language. But no. Since then, Debian shipped three major releases (stretch, buster, and bullseye), all with the same vulnerable procmail release. Then, in early 2022, I found that, at work, we actually had procmail installed everywhere, possibly because userdir-ldap was using it for lockfile until 2019. I sent a patch to fix that and scrambled to remove get rid of procmail everywhere. That took about a day. But many other sites are now in that situation, possibly not imagining they have this glaring security hole in their infrastructure.

Procmail to Sieve recipes I'll collect a few Sieve equivalents to procmail recipes here. If you have any additions, do contact me. All Sieve examples below assume you drop the file in ~/.dovecot.sieve.

deliver mail to "plus" extension folder Say you want to deliver user+foo@example.com to the folder foo. You might write something like this in procmail:
MAILDIR=$HOME/Maildir/
DEFAULT=$MAILDIR
LOGFILE=$HOME/.procmail.log
VERBOSE=off
EXTENSION=$1            # Need to rename it - ?? does not like $1 nor 1
:0
* EXTENSION ?? [a-zA-Z0-9]+
        .$EXTENSION/
That, in sieve language, would be:
require ["variables", "envelope", "fileinto", "subaddress"];
########################################################################
# wildcard +extension
# https://doc.dovecot.org/configuration_manual/sieve/examples/#plus-addressed-mail-filtering
if envelope :matches :detail "to" "*"  
  # Save name in $ name  in all lowercase
  set :lower "name" "$ 1 ";
  fileinto "$ name ";
  stop;
 

Subject into folder This would file all mails with a Subject: line having FreshPorts in it into the freshports folder, and mails from alternc.org mailing lists into the alternc folder:
:0
## mailing list freshports
* ^Subject.*FreshPorts.*
.freshports/
:0
## mailing list alternc
* ^List-Post.*mailto:.*@alternc.org.*
.alternc/
Equivalent Sieve:
if header :contains "subject" "FreshPorts"  
    fileinto "freshports";
  elsif header :contains "List-Id" "alternc.org"  
    fileinto "alternc";
 

Mail sent to root to a reports folder This double rule:
:0
* ^Subject: Cron
* ^From: .*root@
.rapports/
Would look something like this in Sieve:
if header :comparator "i;octet" :contains "Subject" "Cron"  
  if header :regex :comparator "i;octet"  "From" ".*root@"  
        fileinto "rapports";
   
 
Note that this is what the automated converted does (below). It's not very readable, but it works.

Bulk email I didn't have an equivalent of this in procmail, but that's something I did in Sieve:
if header :contains "Precedence" "bulk"  
    fileinto "bulk";
 

Any mailing list This is another rule I didn't have in procmail but I found handy and easy to do in Sieve:
if exists "List-Id"  
    fileinto "lists";
 

This or that I wouldn't remember how to do this in procmail either, but that's an easy one in Sieve:
if anyof (header :contains "from" "example.com",
           header :contains ["to", "cc"] "anarcat@example.com")  
    fileinto "example";
 
You can even pile up a bunch of options together to have one big rule with multiple patterns:
if anyof (exists "X-Cron-Env",
          header :contains ["subject"] ["security run output",
                                        "monthly run output",
                                        "daily run output",
                                        "weekly run output",
                                        "Debian Package Updates",
                                        "Debian package update",
                                        "daily mail stats",
                                        "Anacron job",
                                        "nagios",
                                        "changes report",
                                        "run output",
                                        "[Systraq]",
                                        "Undelivered mail",
                                        "Postfix SMTP server: errors from",
                                        "backupninja",
                                        "DenyHosts report",
                                        "Debian security status",
                                        "apt-listchanges"
                                        ],
           header :contains "Auto-Submitted" "auto-generated",
           envelope :contains "from" ["nagios@",
                                      "logcheck@",
                                      "root@"])
     
    fileinto "rapports";
 

Automated script There is a procmail2sieve.pl script floating around, and mentioned in the dovecot documentation. It didn't work very well for me: I could use it for small things, but I mostly wrote the sieve file from scratch.

Progressive migration Enrico Zini has progressively migrated his procmail setup to Sieve using a clever way: he hooked procmail inside sieve so that he could deliver to the Dovecot LDA and progressively migrate rules one by one, without having a "flag day". See this explanatory blog post for the details, which also shows how to configure Dovecot as an LMTP server with Postfix.

Other examples The Dovecot sieve examples are numerous and also quite useful. At the time of writing, they include virus scanning and spam filtering, vacation auto-replies, includes, archival, and flags.

Harmful considered harmful I am aware that the "considered harmful" title has a long and controversial history, being considered harmful in itself (by some people who are obviously not afraid of contradictions). I have nevertheless deliberately chosen that title, partly to make sure this article gets maximum visibility, but more specifically because I do not have doubts at this moment that procmail is, clearly, a bad idea at this moment in history.

Developing story I must also add that, incredibly, this story has changed while writing it. This article is derived from this bug I filed in Debian to, quite frankly, kick procmail out of Debian. But filing the bug had the interesting effect of pushing the upstream into action: as mentioned above, they have apparently made a new release and merged a bunch of patches in a new git repository. This doesn't change much of the above, at this moment. If anything significant comes out of this effort, I will try to update this article to reflect the situation. I am actually happy to retract the claims in this article if it turns out that procmail is a stellar example of defensive programming and survives fuzzing attacks. But at this moment, I'm pretty confident that will not happen, at least not in scope of the next Debian release cycle.

23 November 2021

Enrico Zini: Really lossy compression of JPEG

Suppose you have a tool that archives images, or scientific data, and it has a test suite. It would be good to collect sample files for the test suite, but they are often so big one can't really bloat the repository with them. But does the test suite need everything that is in those files? Not necesarily. For example, if one's testing code that reads EXIF metadata, one doesn't care about what is in the image. That technique works extemely well. I can take GRIB files that are several megabytes in size, zero out their data payload, and get nice 1Kb samples for the test suite. I've started to collect and organise the little hacks I use for this into a tool I called mktestsample:
$ mktestsample -v samples1/*
2021-11-23 20:16:32 INFO common samples1/cosmo_2d+0.grib: size went from 335168b to 120b
2021-11-23 20:16:32 INFO common samples1/grib2_ifs.arkimet: size went from 4993448b to 39393b
2021-11-23 20:16:32 INFO common samples1/polenta.jpg: size went from 3191475b to 94517b
2021-11-23 20:16:32 INFO common samples1/test-ifs.grib: size went from 1986469b to 4860b
Those are massive savings, but I'm not satisfied about those almost 94Kb of JPEG:
$ ls -la samples1/polenta.jpg
-rw-r--r-- 1 enrico enrico 94517 Nov 23 20:16 samples1/polenta.jpg
$ gzip samples1/polenta.jpg
$ ls -la samples1/polenta.jpg.gz
-rw-r--r-- 1 enrico enrico 745 Nov 23 20:16 samples1/polenta.jpg.gz
I believe I did all I could: completely blank out image data, set quality to zero, maximize subsampling, and tweak quantization to throw everything away. Still, the result is a 94Kb file that can be gzipped down to 745 bytes. Is there something I'm missing? I suppose JPEG is better at storing an image than at storing the lack of an image. I cannot really complain :) I can still commit compressed samples of large images to a git repository, taking very little data indeed. That's really nice!

8 November 2021

Enrico Zini: An educational debugging session

This morning we realised that a test case failed on Fedora 34 only (the link is in Italian) and we set to debugging. The initial analysis This is the initial reproducer:
$ PROJ_DEBUG=3 python setup.py test
test_recipe (tests.test_litota3.TestLITOTA3NordArkimetIFS) ... pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: Open of /lib64/../share/proj/proj.db failed
pj_open_lib(proj.db): call fopen(/lib64/../share/proj/proj.db) - succeeded
proj_create: no database context specified
Cannot instantiate source_crs
EXCEPTION in py_coast(): ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
ERROR
Note that opening /lib64/../share/proj/proj.db sometimes succeeds, sometimes fails. It's some kind of Schr dinger path, which works or not depending on how you observe it:
# ls -lad /lib64
lrwxrwxrwx 1 1000 1000 9 Jan 26  2021 /lib64 -> usr/lib64
$ ls -la /lib64/../share/proj/proj.db
-rw-r--r-- 1 root root 8925184 Jan 28  2021 /lib64/../share/proj/proj.db
$ cd /lib64/../share/proj/
$ cd /lib64
$ cd ..
$ cd share
-bash: cd: share: No such file or directory
And indeed, stat(2) finds it, and sqlite doesn't (the file is a sqlite database):
$ stat /lib64/../share/proj/proj.db
  File: /lib64/../share/proj/proj.db
  Size: 8925184     Blocks: 17432      IO Block: 4096   regular file
Device: 33h/51d Inode: 56907       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-11-08 14:09:12.334350779 +0100
Modify: 2021-01-28 05:38:11.000000000 +0100
Change: 2021-11-08 13:42:51.758874327 +0100
 Birth: 2021-11-08 13:42:51.710874051 +0100
$ sqlite3 /lib64/../share/proj/proj.db
Error: unable to open database "/lib64/../share/proj/proj.db": unable to open database file
A minimal reproducer Later on we started stripping layers of code towards a minimal reproducer: here it is. It works or doesn't work depending on whether proj is linked explicitly, or via MagPlus:
$ cat tc.cc
#include <magics/ProjP.h>
int main()  
    magics::ProjP p("EPSG:4326", "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs");
    return 0;
 
$ g++ -o tc  tc.cc -I/usr/include/magics  -lMagPlus
$ ./tc
proj_create: Open of /lib64/../share/proj/proj.db failed
proj_create: no database context specified
terminate called after throwing an instance of 'magics::MagicsException'
  what():  ProjP: cannot create crs to crs from [EPSG:4326] to [+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +over +units=m +no_defs]
Aborted (core dumped)
$ g++ -o tc  tc.cc -I/usr/include/magics -lproj  -lMagPlus
$ ./tc
What is going on here? A difference between the two is the path used to link to libproj.so:
$ ldd ./tc   grep proj
    libproj.so.19 => /lib64/libproj.so.19 (0x00007fd4919fb000)
$ g++ -o tc  tc.cc -I/usr/include/magics   -lMagPlus
$ ldd ./tc   grep proj
    libproj.so.19 => /lib64/../lib64/libproj.so.19 (0x00007f6d1051b000)
Common sense screams that this should not matter, but we chased an intuition and found that one of the ways proj looks for its database is relative to its shared library. Indeed, gdb in hand, that dladdr call returns /lib64/../lib64/libproj.so.19. From /lib64/../lib64/libproj.so.19, proj strips two paths from the end, presumably to pass from something like /something/usr/lib/libproj.so to /something/usr. So, dladdr returns /lib64/../lib64/libproj.so.19, which becomes /lib64/../, which becomes /lib64/../share/proj/proj.db, which exists on the file system and is used as a path to the database. But depending how you look at it, that path might or might not be valid: it passes the stat(2) check that stops the lookup for candidate paths, but sqlite is unable to open it. Why does the other path work? By linking libproj.so in the other way, dladdr returns /lib64/libproj.so.19, which becomes /share/proj/proj.db, which doesn't exist, which triggers a fallback to a PROJ_LIB constant defined at compile time, which is a path that works no matter how you look at it. Why that weird path with libMagPlus? To complete the picture, we found that libMagPlus.so is packaged with a rpath set, which is known to cause trouble
# readelf -d /usr/lib64/libMagPlus.so grep rpath
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../lib64]
The workaround We found that one can set PROJ_LIB in the environment to override the normal proj database lookup. Building on that, we came up with a simple way to override it on Fedora 34 only:
    if distro is not None and distro.linux_distribution()[:2] == ("Fedora", "34") and "PROJ_LIB" not in os.environ:
         self.env_overrides["PROJ_LIB"] = "/usr/share/proj/"
This has been a most edifying and educational debugging session, with only the necessary modicum of curses and swearwords. Working in a team of excellent people really helps.

2 November 2021

Enrico Zini: help2man and subcommands

help2man is quite nice for autogenerating manpages from command line help, making sure that they stay up to date as command line options evolve. It works quite well, except for commands with subcommands, like Python programs that use argparse's add_subparser. So, here's a quick hack that calls help2man for each subcommand, and stitches everything together in a simple manpage.
#!/usr/bin/python3
import re
import shutil
import sys
import subprocess
import tempfile
# TODO: move to argparse
command = sys.argv[1]
# Use setup.py to get the program version
res = subprocess.run([sys.executable, "setup.py", "--version"], stdout=subprocess.PIPE, text=True, check=True)
version = res.stdout.strip()
# Call the main commandline help to get a list of subcommands
res = subprocess.run([sys.executable, command, "--help"], stdout=subprocess.PIPE, text=True, check=True)
subcommands = re.sub(r'^.+\ (.+)\ .+$', r'\1', res.stdout, flags=re.DOTALL).split(',')
# Generate a help2man --include file with an extra section for each subcommand
with tempfile.NamedTemporaryFile("wt") as tf:
    print("[>DESCRIPTION]", file=tf)
    for subcommand in subcommands:
        res = subprocess.run(
                ["help2man", f"--name= command ", "--section=1",
                 "--no-info", "--version-string=dummy", f"./ command   subcommand "],
                stdout=subprocess.PIPE, text=True, check=True)
        subcommand_doc = re.sub(r'^.+.SH DESCRIPTION', '', res.stdout, flags=re.DOTALL)
        print(".SH ", subcommand.upper(), " SUBCOMMAND", file=tf)
        tf.write(subcommand_doc)
    with open(f" command .1.in", "rt") as fd:
        shutil.copyfileobj(fd, tf)
    tf.flush()
    # Call help2man on the main command line help, with the extra include file
    # we just generated
    subprocess.run(
            ["help2man", f"--include= tf.name ", f"--name= command ",
             "--section=1", "--no-info", f"--version-string= version ",
             "--output=arkimaps.1", "./arkimaps"],
            check=True)

22 October 2021

Enrico Zini: Scanning for imports in Python scripts

I had to package a nontrivial Python codebase, and I needed to put dependencies in setup.py. I could do git grep -h import sort -u, then review the output by hand, but I lacked the motivation for it. Much better to take a stab at solving the general problem The result is at https://github.com/spanezz/python-devel-tools. One fun part is scanning a directory tree, using ast to find import statements scattered around the code:
class Scanner:
    def __init__(self):
        self.names: Set[str] = set()
    def scan_dir(self, root: str):
        for dirpath, dirnames, filenames, dir_fd in os.fwalk(root):
            for fn in filenames:
                if fn.endswith(".py"):
                    with dirfd_open(fn, dir_fd=dir_fd) as fd:
                        self.scan_file(fd, os.path.join(dirpath, fn))
                st = os.stat(fn, dir_fd=dir_fd)
                if st.st_mode & (stat.S_IXUSR   stat.S_IXGRP   stat.S_IXOTH):
                    with dirfd_open(fn, dir_fd=dir_fd) as fd:
                        try:
                            lead = fd.readline()
                        except UnicodeDecodeError:
                            continue
                        if re_python_shebang.match(lead):
                            fd.seek(0)
                            self.scan_file(fd, os.path.join(dirpath, fn))
    def scan_file(self, fd: TextIO, pathname: str):
        log.info("Reading file %s", pathname)
        try:
            tree = ast.parse(fd.read(), pathname)
        except SyntaxError as e:
            log.warning("%s: file cannot be parsed", pathname, exc_info=e)
            return
        self.scan_tree(tree)
    def scan_tree(self, tree: ast.AST):
        for stm in tree.body:
            if isinstance(stm, ast.Import):
                for alias in stm.names:
                    if not isinstance(alias.name, str):
                        print("NAME", repr(alias.name), stm)
                    self.names.add(alias.name)
            elif isinstance(stm, ast.ImportFrom):
                if stm.module is not None:
                    self.names.add(stm.module)
            elif hasattr(stm, "body"):
                self.scan_tree(stm)
Another fun part is grouping the imported module names by where in sys.path they have been found:
    scanner = Scanner()
    scanner.scan_dir(args.dir)
    sys.path.append(args.dir)
    by_sys_path: Dict[str, List[str]] = collections.defaultdict(list)
    for name in sorted(scanner.names):
        spec = importlib.util.find_spec(name)
        if spec is None or spec.origin is None:
            by_sys_path[""].append(name)
        else:
            for sp in sys.path:
                if spec.origin.startswith(sp):
                    by_sys_path[sp].append(name)
                    break
            else:
                by_sys_path[spec.origin].append(name)
    for sys_path, names in sorted(by_sys_path.items()):
        print(f" sys_path or 'unidentified' :")
        for name in names:
            print(f"   name ")
An example. It's kind of nice how it can at least tell apart stdlib modules so one doesn't need to read through those:
$ ./scan-imports  /himblick
unidentified:
  changemonitor
  chroot
  cmdline
  mediadir
  player
  server
  settings
  static
  syncer
  utils
 /himblick:
  himblib.cmdline
  himblib.host_setup
  himblib.player
  himblib.sd
/usr/lib/python3.9:
  __future__
  argparse
  asyncio
  collections
  configparser
  contextlib
  datetime
  io
  json
  logging
  mimetypes
  os
  pathlib
  re
  secrets
  shlex
  shutil
  signal
  subprocess
  tempfile
  textwrap
  typing
/usr/lib/python3/dist-packages:
  asyncssh
  parted
  progressbar
  pyinotify
  setuptools
  tornado
  tornado.escape
  tornado.httpserver
  tornado.ioloop
  tornado.netutil
  tornado.web
  tornado.websocket
  yaml
built-in:
  sys
  time
Maybe such a tool already exists and works much better than this? From a quick search I didn't find it, and it was fun to (re)invent it. Updates: Jakub Wilk pointed out to an old python-modules script that finds Debian dependencies. The AST scanning code should be refactored to use ast.NodeVisitor.

Next.

Previous.